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ABSTRACT 

The proposition that the optimal number of options in 
a multiple choice test item is three was examined. The concept of 
functional distractor, a plausible wrong answer that is negatively 
discriminating when total test performance is the criterion, is 
discussed. Three distinct groups of achievers (high, middle, and low) 
on a national standardized achievement test for physician*; were 
identified. This number of functional distractors was identified for 
each sample condition and each item. Results suggest, as have 
theoretical analyses, that more functional distractors are desirable. 
However, with higher achieving students the number of functional 
distractors, and consequently the niimber of options, had no effect. 
The time and effort devoted to additional option item development is 
probably not worth the gains in item discrimination and reliability 
for high and middle achieving examinees. The three-option format is 
efficient to construct, and it results in better domain-referenced 
measures of achievement. (SLD) 
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ABSTRACT 
Functional Distractors: 
implications for Test-Item Writing and Test Design 



A recent review of research on the desirable number of opdons for a 
multiple-choice test item reveals that two or three options may be suitable for 
most examinees for an achievement test Most textbooks recommaid four- or 
five- option items. The number of options in an item should be based 
upon the functionality of each q)tion. In this smdy, a functional distractor is 
defined, research on the optimal number of options is reviewed, and a study is 
rcponed on the number of functional distractors in a high quality achievement 
testing program. This research examines the proposition that the number of 
functional distractors per item is optimally around two, and that the three- 
option format is not only more efficient to construct but also leads to bcncr 
domain-referenced measures of achievement 
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Functional Distractors: 
Implications for Test-Item Writing and Test Design 

The design of any multiple-choice test item is often guided by one's 
experiences, common sense, and item-writing lore, passed on from mentors or 
textbooks. In a review of 46 textbooks treating the topic of writing multiple- 
choice test items, Haladyna and Downing (in press, a) rcponcd that authors 
often disagreed in their advice on the ideal number of options. Some 
recomiLended four or five, while others recommended producing as many as were 
feasible or plausible. 

The prcscni smdy tested the proposition that the optimal number of 
options is three. In most well designed achievement tests, more than two 
functional distractors are rare and do not necessarily contribute to more 
effective measurement of achievement. Given tnat this proposition holds, test 
developers at all levels and in all areas would be better served to design test 
items which contain fewer, but more functional, distractors. 

The benefits of using fewer distractors is to reduce item development 
time, reduce the length of tests, reduce reading and administration time, and 
still retain the measurement properties desired. Additionally, the use of 
functional distractors enable the productive use of promising new technologies, 
such as polychotomous scoring models (Bock, 1972; Thissen, 1975; Sympson, 
1985) to provide better estimates of achievement or ability. 

The proposition that three options is optimal for most testing purposes 
derives from an analysis of past and current research on this topic from both 
theoretical and empirical perspectives. 
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Research on the Oprimal NumhtT Qpnpn?! 

Theoretical Perspectives. Lord (1944) conducted one of the earliest 
studies; he developed a formula for predicting changes in reliability as a 
function of the number of options added to any multiple-choice item. Lord's 
study suggested that a three-option item is optimal for most exaoiin :s. 
Tvcrsky (1964) reached the same conclusion, based on an analysis of three 
criteria (discriminability, power, and infonnation of a test). Studies by Ebel 
(1969). Grier (1975; 1976). and again by Lord (1977) support these findings. 
Lord's more recent study is most informative about where on the achievement 
scale the three-option item works best Using item information curves. Lord 
(1977) shows that the three-option item provides the most infonnation at the 
midrange of the score scale, while the tivo-option item works best at the upper 
range of the score scale. Four- or five-option items work best at the lower 
range, where guessing is more frequent and the plausibility of wrong answers is 
more likely to prove effective. Lcvine and Drasgow (1983) generally confinn 
Lord's analysis. 

Budescu and Nevo (1985) take issue with the law of proportionality which 
is often used in these dieoretical studies. This law states that the total 
testing time is proportional to the number of items and options on the test 
Their empirical study provides data to refute this law. thus challenging the 
validity of the conclusion about the optimality of three options. However, 
their data show that administration time is greater with the use of more 
options. No one can dispute that writing more disiractors takes more time. 
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Empirical Pcrspcnivcs. Haladyna and Downing (in press, b) presented a 
synthesis of 23 studies on the number of options. Unfortunately, many 
researchers have considered mainly item difficulty and not treated the more 
important item discrimination, test score reliability, and validity issues related 
to the choice of the number of options. These studies, limited as they are, 
show that slight gains in item discrimination are achieved through the use of 
more options, but these gains are not meaningful. None of these sradies tested 
Lord's analysis that there is differential value for the number of options as a 
function of the achievement level of the examinee, and none of these studies 
examined the functionality of distractors as they contribute to measurement or 
the improvement of items. 

Functional Distractors. While this theoretical and empiric?! research 
suggests using only three options for a multiple-choice test items, the 
functionality of distractors has not always been considered in these prior 
studies. For instance, infrequently selected distractors should be eliminated 
because only random guessers choose these options. Such non functional 
distractors can be eliminated on logical or empirical grounds. 

A multiple-choice test item is usually designed to satisfy content 
specifications which are operationalized through a set of objectives, a test 
blueprint, or a description of competence in a profession. The assembly of 
items into an achievement test therefore satisfies the bases for content-valid 
test score interpretations. 

The multiple-choice item is traditionally evaluated based on difficulty and 
discrimination besides content considerations. Qassical test theory dictates that 
the optimal item has moderate difficulty, with item discrimination being as high 
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as possible. This condition tends to maximize both the variance of scores in 
the distribution and test score reliability. 

Functional distractors are negatively discriminating when total test 
pcrfonnance is the criterion, if each distractor is to work as intended. 
Discrimination is assessed with the point-biscrial correlation between 
performance on each distractor and total trst score performance; negative 
distractors should exhibit negative correlations carefully evaluated with respect 
to their continued use in the test item. Since distractors are intended to be 
plausible wrong answers, it seems both illogical and undesirable for such options 
to have positive relationships to the total lest score. 

From item response theory perspective, items are evaluated on the basis 
of unidiraensionality or fit to the dimension under consideration. In item 
response theor>', items are also evaluated in terms of an information function, 
which defines the ability of uc item to measure at different levels of 
achievement (Lord, 1980). Distractors will often display a negative item 
characteristic curve (Sympson, 1985; Thissen, 1975) which are useful in 
polychotomous scoring models. 

Since the objective of measurement is to reduce error to allow valid 
interpretations from test data, the study and control of distractors in the 
framework of either classical or item response theory is justified. 
Whether scoring is dichotomous or polychotomous, the careful construction and 
thoughtful evaluation of distractors appears justified. 

If tiie item characteristic curve for the keyed answer is positive, it can be 
shown that tiie collective item characteristic curves for wrong options must be 
negative. This property can be illustrated in tiie following item: 
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OPTIONS 

A B C* D E A+B+D+E 

Upper Third 4 0 68 11 17 32 

Middle Third 10 2 62 11 15 38 

Lower Third 28 3 48 13 8 52 

The three performance levels for the correct response, C, mark the 

approximate item charactensdc curve. Figure 1-a illustrates a traditional item 

charaaeristic curve. The sum of responses of the distraaors (A, B, D, and E) 

marks an approximate negative item characteristic airve. Distractor A is 

functional because it displays a negatively sloping item characteristic curve and 

a negative point-biserial relationship. Figure 1-b illustrates a desirable 

negatively sloping item characteristic curve. 

If an option is not often selected by examinees, one should question that 
option^ usefulness as a distractor. Thus, simple frequency of selection of an 
option can stand as one criterion to evaluate functionality. Those seldom 
selected options should be removed, since the lengtii of the item, administration 
time, and amount of extraneous reading are negative aspects of having useless 
options in an item. Option B has this characteristic in the above example. 
Figure 1-c illustrates this condition. 

Figure 1-d illustrates another instance a non functional distractor, one 
where there are sufficient responses to it^ distractor but no intelligible 
panem as a function of student achievement The point-biserial relationship 
between performance on tins distractor and total test perforaiance is close to 
zero. Consequendy, this distractor adds no information in polychotomous 
scoring. Option D represents this characteristic in the example above example. 
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Insert Figure 1 here 

The last example, option E, has a positive item characteristic curve and 
positive point-biscrial correlation with the total score. Given that distractors 
should have a negative sloping item characteristic curve, this option interferes 
with the measurement objective by supplying a source cf ciror. 

To su mm a r ize, a functional distractor has the following characteristics: 

1. A significant negative point-biserial relationship to total test score. 

2. A negatively sloping item characteristic curve; and 

3. A frequency of response greater than 5% for the total group of 
examinees. 

Dysfunctional distractors include all other conditions, and are illustrated in the 
example. These include options which are (1) infrequently selected, (2) have no 
statistically significant relationship to the criterion, or (3) have a positive, 
significant relationship to the criterion. 
The Present Smdv 

If the optimal number of options is three, then we would expect several 
conditions to be present in standardized achievement tests. 

1. The most frequently observed number of functional distractors for a 
representative sample of items should be two. For high achievers ^his number 
should be one, and for low achievers this number should be three or four, 
because low achievers are more likely to guess and thus display a greater 
tendency to be drawn to plausible distractors. 

2. A negative relationship between the number of functional distractors 
and item difficulty for a heterogeneous sample is predicted, because low 
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achicvei-s tend to use more options than high achievers. Also there is a natural 
liniit for functional distractors, easy items have distractors which are seldom 
used, so one would expect such items to exhibit few functional distractors. 
With the lower achieving students, we would expect a negative or no 
relationship between the number of functional distractors and difficulty. Faulty 
items would be likely to have few or no functional distractors. 

3. The relationship between the nuLiber of functional distractors and 
item discrimination should be positive with a heterogeneous sample. The more 
distractors, the more likely the item will discriminate. This premise generally 
supports the use of more options. However, when exancdning the usefulness of 
functional distractors with high achievers, this relationship should not be 
present With low achievers, this relationship should be more pronounced, 
because plausible, functional distractors are more likely to be useful, 

4. The mean difficulty and mean discrimination is a function of the 
number of functional distractors for each item. With difficulty, items with no 
functional distractors arc likely to be easy. Items with one or two functional 
distractors should be moderate in difficulty, while items with three or four 
functional distractors should be difficult 

For discrimination, items with no functional distractors are either flawed 
or very easy and discrimination should be low. Items vtith two functional 
distractors should have moderate to high discrimination, while items \^'ith three 
or four functional distractors should have higher discrimination. 

However, the analysis based on the entire sample may be misleading since 
previously reviewed theoretical research predicts ♦hat items with two functional 
distractors are best for high achievers and that three and four functional 
distractors serve best for middle and low achievers. This study separated three 
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disdna groups of achievers (high, middle, and low), and an analysis detcraiincd 
in which groups did two, three, or four functional distractors work best 

METHOD 

Data Base 

The data for this study came from a national standardized achievement 
test for physicians. This test is part of a continuing education program and is 
given annually to over 1,000 specialists in a surgical specialty of medicine. Tlie 
standards for item and test devcl ^pment are quite high, and include such 
activities as training of item writers, use of detailed content specifications, item 
review by colleagues, editorial and psychometric reviews, pretesting, and 
selection of items by a panel of national experts. 
Analvsisof Para 

The sample of 1,11 1 examinees was divided into three approximately equal 
groups for the purposes of this study. The number of functional distractors 
was identified for each sample condition and each item. The criteria for 
determining distractor functionality were, as stated before, (1) lack of a 
negatively sloping item characteristic curve, (2) lack of a negative correlation 
between distractor ana test performance, and (3) selection of a response by less 
than 5% of the examinees. Tabulations were made of the numtcr of items 
containing none, one, two, three and four functional distractors for each of the 
four sample conditions. Analyses of variance were done for item difficulrv and 
discrimination based on the number of functional distractors. These were done 
for each sample condition. 

RESULTS AND DISCUSSION 
The top of Table 1 provides descriptive statistics from the data base for 
the total sample and each subsample. The test was moderately difficult, scores 
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ranged widely, and the overall KR-20 estimate of reliability was .91. The 
variability of scores for the upper third and middle thirds was considerably 
more restricted than the lower third for the three sample conditions. 

******************* 

Insert Table 1 here 

**********ti**tH,*** 

The first hypothesis dealt with the frequency of functional distractors in a 
weU developed achievement test Table 1 provides information about the 
extent to which these items had functional distraaors. For the total sample, 
there were onl> 1 1 items which had four functional distractors, while 73 items 
had two functional distractors. Based on the criteria for f unctionaUty and on 
the results with this specific test, two functional distractors per item seems to 
be typical. 

When the number of functional distractors is deteraiined from item 
analyses based on each sample condition, another pattern of results exists 
For the upper third, one functional distractor per item was most often 
noted; no items had four functional distractors. For the middle 1/3, the 
pattern was similar lo that of the upper third. For the lower third, one and 
two functional distractors appears most often, while only one item had four 
functional distractors. 

A restriction in the range of scores will attenuate discrimination, thereby 
making the detection of functional distractors less likely. 
However, if a distraaor is to discriminate between high and low achieving 
examinees at any level, it must be functional. In this snidy, one or two 
functional distractors was typical for the lower achieving sample. 
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Item difficulty. For the total sample, there is a systematic negative 
relationship between item difficulty and number of functional distractors. As 
expected, items with no functional distractors are easy, while those with three 
or four -^onal distractors are the most difficult If we use 
representativeness as the criterion for evaluating items, the items with the 
mean difficulty closest to overall test score difficulty of .688 were those 
having two functional distractors. Fewer functional distractors resulted in 
easier items, more functional distractors resulted in harder items. 

Looking at these results within each sample condition, with the upper 
third group, the most representative set of items had one functional distractor, 
again supporting Lord's contention that two options were optimal for high 
achievers (Lord, 1977). Also, with the middle group, one option produced items 
closest to the average difficulty. With the lower group, it seemed immaterial 
how man) ^ ^nctional distractors there were as each category yielded item 
difficulties at about the same level as the rest (F=0.48, p=.751). 

Item discriminflppn As noted in the lower portion of Table 1, the 
greater number of functional distractors in the item, the higher the 
discrimination. The trend is in a positive direction, and the correlation 
between item discrimination and the number of functional distractors for this 
sample is .333. This finding supports the proposition that three or four 
functional distractors are desirable. 

However, item discrimination and functionality for the upper 1/3 of the 
sample, discrimination is nearly uniform. That is, there is no relationship 
between the number of functional distractors and item discrimination in this 
upper level sample (r= -.001). With the middle 1/3, there is a tendency to 
favor more distractors, which is sharply increased with the lower 1/3 sample. 
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Overall, these results show, as others have suggested in theoretical 
analyses, that more functional distractors are desirable, yet with higher 
achieving students, the number of functional distractors and hence the number 
cf options has no effect 

CONCLUSIONS 

Research on the number of options over the past 60 years has suggested 
using fewer options, with three being desirable for most measurement purposes. 
The strongest rationale for this recommendation is highw efficiency, obtained 
by preparing fewer distractors and administering items which take up less space 
and require less reading, thus reducing administration time. This study has 
supported the proposition that fewer options arc desirable for some 
circumstances. More importuit, while it may be argued that the use of more 
functional distractors leads to items which are more discriminating, this 
research has also shown that items with four functional distractors are rare, at 
least in this sample. Is the time and effort devoted to five-option item 
development and testing worth the gains in item discrimination and reliability? 
Probably not with high or middle achieving examinees. 

As testing moves toward more adaptive procedures, polychotomous scoring, 
and wider implementation of item respon heories, the idea of functional 
distractors should lead test developers to a conclusion that fewer options of 
higher quality will produce better test scores than five-option items containing, 
on the average, only two functional distractors. Item writers may wish to 
develop more options, but the continued use of so many non functional 
distractors provides no positive advantage over the three-option format with 
two functional distractors. 
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Also, there are emerging theories of item development which focus on the 
functionality of distractors from a logical Judgmental, and theoretical bases 
(Tatsuoka, 1983; Tatsuoka and Tatsuoka, 1982, 1983; Webb, Herman, CabcUo, 
1986). Statistical and cognitive learning theories propose 
integrating teaching and testing to help test designers to build bcner items, 
and hence better tests (Roid and Haladyna, 1982). The consideration of the 
functionally in distractors and application to item design and analysis should 
contribute to this growing technology for achievement testing. 
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Table 1 



Sample Conditions 

Total 
Upper 1/3 
Middle 1/3 
Lower 1^ 



Descriptive Statistics from the Data Base 



Sample 




Stan. 


Item 


Item 


Size 


Mean 




IM. 


Pise, 


nil 


137.6 


20.6 


.688 


.236 


371 


159.0 


7.5 


.795 


.101 


370 


139.6 


5.0 


.698 


.059 


370 


114.3 


13.8 


.571 


.147 



KR-20 

.91 
.49 

.77 



Distribution Number of Functional Distractors For Varying Samples 



Sample Conditions 

Total 
Upper 1/3 
Middle 1/3 
Lower 1/3 





One 


Two 


Three 


Four 


13 


49 


73 


54 


11 


67 


90 


37 


6 


0 


62 


85 


44 


9 


0 


22 


70 


78 


29 


1 



Analyses of Variance on Item DifTiculty and Item Discrimination 
Using Number of Functional Distractors as the Independent Variable 



Sample Conditions 



Total 
Upper 1/3 
Middle 1/3 
Lower 1/3 



Sample Conditions 



Total 
Upper 1/3 
Middle 1/3 
Lower 1/3 



Item Difficulty 
Number of Funcrional Disffatttors 
Zero One Two Three Four 

.819 .752 .675 .626 .637 

.758 .671 .627 .538 

.756 .684 .622 .573 

.704 .700 .677 .676 .760 

Item Discriminadon 
Number of Functional D istractors 
Zero One Two Three Four 



.162 .208 

.232 .241 

.200 .240 

.125 .205 



.238 
.233 
.264 
.258 



.268 
.222 
.307 
.337 



.311 



Effect 



11.24 


.001 


.187 


13.80 


.001 


.174 


13.15 


.001 


.167 


0.48 


.751 








Effect 


E 




Size 


6.20 


.001 


.113 


0.16 


.926 




5.76 


.001 


.081 


23.93 


.001 


.329 
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Figure l a: Typical item cliaracierisiic curve 
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i'iguie 1-c: Low icsiHMi.seilcincliaiaclciisiic curve 
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Figure 1-b: Negatively sloping item characteristic curve 



1.00 



ona 



o 

?4 O.-fO 



0 20 



o.on 



O 00 



Y J , 1 1 J 1 — vj — 

O.20 0.40 O.CO OHO 9.00 



0.40 O.CO 

riinci;NTiu;/i(w 



Figure l-d: iHat item ciiaractcristic curve 



